cpp dev tools && clang tools

abstract

  1. C++ dev tools
  2. clang tools

C++ dev tools

why we need tools

C makes it easy to shoot yourself in the foot;
C++ makes it harder, but when you do it blows your whole leg off.

-- Bjarne Stroustrup

C++ is a powerful language, but it is also a complex language, so it is easy to import bad smell code into cur project.
We can keep our code high quality by code review, but it is not efficient and not enough.
CR is a human work, it cost too many time and energy to find some simple or hide bugs.
So we need some tools to help us to find the simple bugs automatically and bring our attention to the important && interesting things.

what is good code

  • easy to read
  • easy to maintain
  • easy to use
  • work as expected
  • work fast

tools help us write good code

  1. formatter
  2. code generator
  3. code analyzer
  4. code refactor tools
  5. test framework
  6. benchmark (test) framework

formatter

why we need formatter

  • ensure consistent code style accross the code data base
  • reduce time on code style discussions
  • keep CR reviewer focus on logic
  • automatically format code to save time

when we format code

  • save file
  • pre-commit hook
  • CI

formatter tools list

  • clang-format
  • astyle

code generator

why we need code generator

  • reduce time on boring work
  • keep code style guidelines
  • use common design patterns
  • avoid human error

when we use code generator

  • generate implementation codesnippet from a interface
  • generate ctor && dtor && copy ctor && move ctor && copy assignment && move assignment
  • generate getter && setter member function
  • generate implementation codesnippet from a proto

code generator tools list

  • IDES(Visual Studio, CLion, …)
  • protoc

code analyzer

why we need code analyzer (linter)

  • find undefined behavior && potential bugs automatically
  • find bad smell code automatically
  • keep code style guidelines (modernize, readability, performance, …)

static analyzer

  • Build warnings
  • Other tools
    • clang-tidy
    • coverity
    • cppcheck

dynamic analyzer

  • valgrind
  • address sanitizer

code refactor tools

what is refactor

  • Basic set
    • rename
    • extract function
  • Profound set
    • change function signature
    • push/pull data member up/down in class hierarchy
    • modernize

why we need refactor tools

Maybe you will say, we can do refactor by hand or use regex, why we need tools to do it?

Just think about rename:

1
2
3
4
// example for confusing names
Struct stat stat; // stat is a struct name, but also a variable name
stat("file", &stat); // stat is a function name
printf("%d", stat.size);

If we want to rename struct name ‘stat’ to ‘Mystat’, how can we do it?
We can use refactor tools such like clang-rename to do it, thanks clangParse, clangSema, clangAST and many other tools did the hard work.

code refactor tools list

  • IDES(Visual Studio, CLion, …)
  • clangRefactor
  • clangMR(MapReduce)

test framework

Every one know test is important, so I will not talk about why we need test.

test framework list

  • Google Test
  • Boost.Test

benchmark (test) framework

what is benchmark

Example 1

For example, we want to compare the performance of two implements.
First one use std::unordered_map to store the data, and second one use std::map to store the data.
We know std::unordered_map is faster than std::map, but we don’t know how much find faster when we use them store 10000 int and with a O2 optimization.
So we write benchmark code to compare them.

benchmark_result

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#include <benchmark/benchmark.h>
#include <unordered_map>
#include <map>
#include <random>

int RandomNumber() {
static std::random_device rd;
static std::mt19937 gen(rd());
static std::uniform_int_distribution<> dis(1, 1000000);
return dis(gen);
}

// Benchmark for std::unordered_map
static void BM_UnorderedMap_Read(benchmark::State& state) {
std::unordered_map<int, int> unordered_map;
for (int i = 0; i < 10000; ++i) {
int num = RandomNumber();
unordered_map[num] = num;
}

for (auto _ : state) {
for (int i = 0; i < 10000; ++i) {
benchmark::DoNotOptimize(unordered_map.find(RandomNumber()));
}
}
}
BENCHMARK(BM_UnorderedMap_Read);

// Benchmark for std::map
static void BM_Map_Read(benchmark::State& state) {
std::map<int, int> map;
for (int i = 0; i < 10000; ++i) {
int num = RandomNumber();
map[num] = num;
}

for (auto _ : state) {
for (int i = 0; i < 10000; ++i) {
benchmark::DoNotOptimize(map.find(RandomNumber()));
}
}
}
BENCHMARK(BM_Map_Read);

BENCHMARK_MAIN();
Example 2

We want to use std::string_view instead of const std::string& to pass string parameter to a function.
But how much performance improvement we can get?
Suppose we use both const char* and std::string to pass a string parameter to a function.
We write benchmark code to compare them.

benchmark_result

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#include <benchmark/benchmark.h>
#include <string>
#include <string_view>
#include <vector>
#include <random>

// Function that generates random strings
std::string GenerateRandomString(size_t length) {
const std::string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
std::random_device random_device;
std::mt19937 generator(random_device());
std::uniform_int_distribution<> distribution(0, chars.size() - 1);

std::string random_string;
for (size_t i = 0; i < length; ++i) {
random_string += chars[distribution(generator)];
}

return random_string;
}

// Function that takes std::string_view
void FunctionWithStringView(std::string_view str) {
benchmark::DoNotOptimize(str.data());
}

// Function that takes const std::string&
void FunctionWithStringRef(const std::string& str) {
benchmark::DoNotOptimize(str.data());
}

// Benchmark for std::string_view with const char*
static void BM_StringViewWithChar(benchmark::State& state) {
std::string str = GenerateRandomString(100);
const char* cstr = str.c_str();

for (auto _ : state) {
FunctionWithStringView(cstr);
}
}
BENCHMARK(BM_StringViewWithChar);

// Benchmark for const std::string& with const char*
static void BM_StringRefWithChar(benchmark::State& state) {
std::string str = GenerateRandomString(100);
const char* cstr = str.c_str();

for (auto _ : state) {
FunctionWithStringRef(cstr);
}
}
BENCHMARK(BM_StringRefWithChar);

// Benchmark for std::string_view with std::string
static void BM_StringViewWithString(benchmark::State& state) {
std::string str = GenerateRandomString(100);

for (auto _ : state) {
FunctionWithStringView(str);
}
}
BENCHMARK(BM_StringViewWithString);

// Benchmark for const std::string& with std::string
static void BM_StringRefWithString(benchmark::State& state) {
std::string str = GenerateRandomString(100);

for (auto _ : state) {
FunctionWithStringRef(str);
}
}
BENCHMARK(BM_StringRefWithString);

BENCHMARK_MAIN();

why we need benchmark

  • performance compare: compare the performance of algorithms || data structures || implementations
  • performance optimization: identify the bottleneck
  • performance regression: ensure the performance is not worse than before

benchmark framework tools


clang tools

clang family

  • clang: C, C++, Objective-C and Objective-C++ compiler
  • clang-format: code formatter
  • clang-tidy: code analyzer && code refactor tools
  • clang-refactor: code refactor tools
  • clangd: language server, support code completion, go to definition, find references, rename, …

precondition when we use clang tools

Compile your source code with clang, make sure they can be compiled successfully to a object file.
And then we can use clang tools to analyze && refactor our code.
Plz take care: only need compile with clang :-), no need to use or deploy the outputs.

what can we do with clang tools

write our own checkers

For example, our code data base was fulled with some bad smell code, but they can pass all the clang-tidy checkers and compiled successfully without any warnings.

1
2
3
4
5
6
7
8
// slowloop.cc
const char* str = GetSomeData();

// If compiler dont optimize strlen(str) to a const value, this loop will run as O(n^2)
// But dont generate any warnings
for (int i = 0; i < strlen(str); ++i) {
// do something
}

We can use clang-AST to make sure what happened when clang compile this code.

1
clang++ -Xclang -ast-dump -fsyntax-only slowloop.cc

And We can get a output AST such like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|   `-ForStmt 0x9759b40 <line:6:3, line:8:3>                                                                           // for(
| |-DeclStmt 0x97599a8 <line:6:8, col:24>
| | `-VarDecl 0x9759930 <col:8, col:23> col:15 used index 'size_t':'unsigned int' cinit // size_t index = 0;
| | `-ImplicitCastExpr 0x9759998 <col:23> 'size_t':'unsigned int' <IntegralCast>
| | `-IntegerLiteral 0x9759970 <col:23> 'int' 0
| |-<<<NULL>>> // i < strlen(str);
| |-BinaryOperator 0x9759ae8 <col:26, col:44> 'bool' '<'
| | |-ImplicitCastExpr 0x9759ad8 <col:26> 'size_t':'unsigned int' <LValueToRValue>
| | | `-DeclRefExpr 0x97599c0 <col:26> 'size_t':'unsigned int' lvalue Var 0x9759930 'index' 'size_t':'unsigned int'
| | `-CallExpr 0x9759aa8 <col:34, col:44> 'size_t':'unsigned int'
| | |-ImplicitCastExpr 0x9759a98 <col:34> 'size_t (*)(const char *) __attribute__((cdecl))' <FunctionToPointerDecay>
| | | `-DeclRefExpr 0x9759a38 <col:34> 'size_t (const char *) __attribute__((cdecl))':'size_t (const char *)' lvalue Function 0x92edfb8 'strlen' 'size_t (const char *) __attribute__((cdecl))':'size_t (const char *)'
| | `-ImplicitCastExpr 0x9759ac8 <col:41> 'const char *' <LValueToRValue>
| | `-DeclRefExpr 0x9759a18 <col:41> 'const char *' lvalue Var 0x9759890 'str' 'const char *'
| |-UnaryOperator 0x9759b20 <col:47, col:49> 'size_t':'unsigned int' lvalue prefix '++' // ++i
| | `-DeclRefExpr 0x9759b00 <col:49> 'size_t':'unsigned int' lvalue Var 0x9759930 'index' 'size_t':'unsigned int'
| `-CompoundStmt 0x9759b30 <col:56, line:8:3>

Finally, we can write our own checkers(patterns) to find this kind of bad smell code.
1
forStmt(hasCondition(hasDescendant(callExpr(callee(functionDecl(hasName("strlen")))))))

For more info, u can see